Some Training Subset Selection Methods for Supervised Learning in Genetic Programming

نویسندگان

  • Chris Gathercole
  • Peter Ross
چکیده

When using the Genetic Programming (GP) Algorithm on a diicult problem with a large set of training cases, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes how to reduce the number of such evaluations by selecting a small subset of the training data set on which to actually carry out the GP algorithm. Three subset selection methods described in the paper are: Dynamic Subset Selection (DSS), using the current GP run to select`diicult' and/or disused cases, Historical Subset Selection (HSS), using previous GP runs, Random Subset Selection (RSS). GP, GP+DSS, GP+HSS, GP+RSS are compared on a large classiication problem. GP+DSS can produce better results in less than 20% of the time taken by GP. GP+HSS can nearly match the results of GP, and, perhaps surprisingly, GP+RSS can occasionally approach the results of GP. GP and GP+DSS are then compared on a smaller problem, and a hybrid Dynamic Fitness Function (DFF), based on DSS, is proposed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Training Subset Selection for Supervised Learning in Genetic Programming

When using the Genetic Programming (GP) Algorithm on a dii-cult problem with a large set of training data, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes some eeorts made to reduce the number of such evaluations by concentrating on selecting a small subset of the training data set on which to actually carry out th...

متن کامل

Towards Efficient Training on Large Datasets for Genetic Programming

Genetic programming (GP) has the potential to provide unique solutions to a wide range of supervised learning problems. The technique, however, does suffer from a widely acknowledged computational overhead. As a consequence applications of GP are often confined to datasets consisting of hundreds of training exemplars as opposed to tens of thousands of exemplars, thus limiting the widespread app...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection

Feature subset selection is of great importance in the field of data mining. The high dimension data makes testing and training of general classification methods difficult. In the present paper two filters approaches namely Gain ratio and Correlation based feature selection have been used to illustrate the significance of feature subset selection for classifying Pima Indian diabetic database (P...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994